Robust Emotional Stressed Speech Detection Using Weighted Frequency Subbands

نویسندگان

  • John H. L. Hansen
  • Wooil Kim
  • Mandar Rahurkar
  • Evan Ruzanski
  • James Meyerhoff
چکیده

The problem of detecting psychological stress from speech is challenging due to differences in how speakers convey stress. Changes in speech production due to speaker state are not linearly dependent on changes in stress. Research is further complicated by the existence of different stress types and the lack of metrics capable of discriminating stress levels. This study addresses the problem of automatic detection of speech under stress using a previously developed feature extraction scheme based on the Teager Energy Operator (TEO). To improve detection performance a (i) selected sub-band frequency partitioned weighting scheme and (ii) weighting scheme for all frequency bands are proposed. Using the traditional TEO-based feature vector with a closed-speaker Hidden Markov Model-trained stressed speech classifier, error rates of 22.5/13.0% for stress/neutral speech are obtained. With the new weighted sub-band detection scheme, closed-speaker error rates are reduced to 4.7/4.6% for stress/neutral detection, with a relative error reduction of 79.1/64.6%, respectively. For the open-speaker case, stress/neutral speech detection error rates of 69.7/16.2% using traditional features are used to 13.1/4.0% (a relative 81.3/75.4% reduction) with the proposed automatic frequency sub-band weighting scheme. Finally, issues related to speaker dependent/independent scenarios, vowel duration, and mismatched vowel type on stress detection performance are discussed.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Frequency band analysis for stress detection using a teager energy operator based feature

Studies have shown that the performance of speech recognition algorithms severely degrade due to the presence of task and emotional induced stress in adverse conditions. This paper addresses the problem of detecting the presence of stress in speech by analyzing nonlinear feature characteristics in specific frequency bands. The framework of the previously derived Teager Energy Operator(TEO) base...

متن کامل

Voice Activity Detection Using Spectral Entropy in Bark-Scale Wavelet Domain

In this paper, a novel entropy-based voice activity detection (VAD) algorithm is presented in variable-level noise environment. Since the frequency energy of different types of noise focuses on different frequency subband, the effect of corrupted noise on each frequency subband is different. It is found that the seriously obscured frequency subbands have little word signal information left, and...

متن کامل

Subband architecture for automatic speaker recognition

We present an original approach for automatic speaker identification especially applicable to environments which cause partial corruption of the frequency spectrum of the signal. The general principle is to split the whole frequency domain into several subbands on which statistical recognizers are independently applied and then recombined to yield a global score and a global recognition decisio...

متن کامل

Generating stressed speech from neutral speech using a mod CELP vocoder ’ ified Sahar

The problem of speech modeling for generating stressed speech using a source generator framework is addressed in this paper. In general, stress in this context refers to emotional or task induced speaking conditions. Throughout this particular study, the focus will be limited to speech under angry, loud and Lombard effect (i.e., speech produced in noise) speaking conditions. Source generator th...

متن کامل

A long, deep and wide artificial neural net for robust speech recognition in unknown noise

A long deep and wide artificial neural net (LDWNN) with multiple ensemble neural nets for individual frequency subbands is proposed for robust speech recognition in unknown noise. It is assumed that the effect of arbitrary additive noise on speech recognition can be approximated by white noise (or speech-shaped noise) of similar level across multiple frequency subbands. The ensemble neural nets...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • EURASIP J. Adv. Sig. Proc.

دوره 2011  شماره 

صفحات  -

تاریخ انتشار 2011